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Abstract — We consider adaptive system identification problems 
with convex constraints and propose a family of regularized 
Least-Mean-Square (LMS) algorithms. We show that with a 
properly selected regularization parameter the regularized LMS 
provably dominates its conventional counterpart in terms of mean 
square deviations. We establish simple and closed-form expres- 
sions for choosing this regularization parameter. For identifying 
an unknown sparse system we propose sparse and group-sparse 
LMS algorithms, which are special examples of the regularized 
LMS family. Simulation results demonstrate the advantages of 
the proposed filters in both convergence rate and steady-state 
error under sparsity assumptions on the true coefficient vector. 

Index Terms — LMS, NLMS, convex regularization, sparse 
system, group sparsity, 11 norm 



L Introduction 

The Least Mean Square (LMS) algorithm, introduced by 
Widrow and Hoff yj, is a popular method for adaptive 
system identification. Its applications include echo cancelation, 
channel equalization, interference cancelation and so forth. 
Although there exist algorithms with faster convergence rates 
such as the Recursive Least Square (RLS) methods, LMS-type 
methods are popular because of its ease of implementation, 
low computational costs and robustness. 

In many scenarios often prior information about the un- 
known system is available. One important example is when 
the impulse response of the unknown system is known to be 
sparse, containing only a few large coefficients interspersed 
among many small ones. Exploiting such prior information can 
improve the filtering performance and has been investigated for 
several years. Early work includes heuristic online selection 
of active taps l^-fT] and sequential partial updating [51, (6); 
other algorithms assign proportional step sizes of different 
taps according to their magnitudes, such as the Proportionate 
Normalized LMS (PNLMS) and its variations f7|, |8|. 

Motivated by LASSO ||9J and recent progress in compres- 
sive sensing pO) , pT) , the authors in p2) introduced an 
£i-type regularization to the LMS framework resulting in 
two sparse LMS methods called ZA-LMS and RZA-LMS. 
This methodology was also applied to other adaptive filtering 
frameworks such as RLS |131, (14] and projection-based adap- 
tive algorithms |15 |. Inheriting the advantages of conventional 
LMS methods such as robustness and low computational 
costs, the sparse LMS filters were empirically demonstrated 
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to achieve superior performances in both convergence rate and 
steady-state behavior, compared to the standard LMS when the 
system is sparse. However, while the regularization parameter 
needs to be tuned there is no systematical way to choose the 
parameter Furthermore, the analysis of |jT2j is only based 
on the £i penalty and not applicable to other regularization 
schemes. 

In this paper, we extend the methods presented in p2) , 
p6| to a broad family of regularization penalties and consider 
LMS and Normalized LMS algorithms (NLMS) fll under 
general convex constraints. In addition, we allow the convex 
constraints to be time-varying. This results in a regularized 
LMS/NLMSQ update equation with an additional sub-gradient 
term. We show that the regularized LMS provably dominates 
its conventional counterpart if a proper regularization parame- 
ter is selected. We also establish a simple and closed-form 
formula to choose this parameter For white input signals, 
the proposed parameter selection guarantees dominance of 
the regularized LMS over the conventional LMS. Next, we 
show that the sparse LMS filters in 1,12|, i.e., ZA-LMS and 
RZA-LMS, can be obtained as special cases of the regularized 
LMS family introduced here. Furthermore, we consider a 
group-sparse adaptive FIR filter response that is useful for 
practical applications |j8), fTT) . To enforce group sparsity we 
use £i 2 type regularization functions |18 | in the regularized 
LMS framework. For sparse and group-sparse LMS methods, 
we propose alternative closed-form expressions for selecting 
the regularization parameters. This guarantees provable domi- 
nance for both white and correlated input signals. Finally, we 
demonstrate performance advantages of our proposed sparse 
and group-sparse LMS filters using numerical simulation. In 
particular, we show that the regularized LMS method is robust 
to model mis-specification and outperforms the contemporary 
projection based methods (15] for equivalent computational 
cost. 

The paper is organized as follows. Section II formulates 
the problem and introduces the regularized LMS algorithm. 
In Section III we develop LMS filters for sparse and group- 
sparse system identification. Section IV provides numerical 
simulation results and Section V summarizes our principal 
conclusion. The proofs of theorems are provided in the Ap- 
pendix. 

Notations: In the following parts of paper, matrices and 
vectors are denoted by boldface upper case letters and boldface 
lower case letters, respectively; (•)^ denotes the transpose 
operator, and || • ||i and || • ||2 denote the £i and £2 norm 
of a vector, respectively. 

' We treat NLMS as a special case of the general LMS algorithm and will 
not distinguish the two unless required for clarity. 
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II. Regularized LMS 

A. LMS framework 

We begin by briefly reviewing the framework of the LMS 
filter, which forms the basis of our derivations to follow. 
Denote the coefficient vector and the input signal vector of 
the adaptive filter as 



and 



W„ = [Wn,Q,Wn,l, ■ ■ ' ,Wn,N-l\ 



(1) 



(2) 



respectively, where n is the time index, x„ is the input signal, 
w)„ i is the i-th coefficient at time n and N is the length of the 
filter. The goal of the LMS algorithm is to identify the true 
system impulse response w from the input signal x„ and the 
desired output signal y„, where 



Vn 



(3) 



Vn is the observation noise which is assumed to be independent 
with Xn- 

Let e„ denote the instantaneous error between the filter 
output w^x„ and the desired output 



y„-w^x„. (4) 

In the standard LMS framework, the cost function L„ is 
defined as the instantaneous square error 

and the filter coefficient vector is updated in a stochastic 
gradient descent manner: 

w„+i = w„ - ^„VL„(w„) = w„ + ^„e„x„, (5) 

where /i„ is the step size controlling the convergence and the 
steady-state behavior of the LMS algorithm. We refer to (j5]) as 
the conventional LMS algorithm and emphasize that /i„ can 
be both time- varying and functions of x„. For example, 

_ an 
^"-|lx„||i 

yields the normaUzed LMS (NLMS) algorithm with variable 
step size a„. 

B. Regularized LMS 

Conventional LMS algorithms do not impose any model on 
the true system response w. However, in practical scenarios 
often prior knowledge of w is available. For example, if the 
system is known to be sparse, the £i norm of w can be upper 
bounded by some constant (9). In this work, we study the 
adaptive system identification problem where the true system 
is constrained by 

/n(w) < ry„, (7) 

where /„(•) is a convex function and 77^ is a constant. We 
note that the subscript n in /„(•) allows adaptive constraints 
that can vary in time. Based on (j?]) we propose a regularized 
instantaneous cost function 

1 



+ 7n/n(w„) 



(8) 



and update the coefficient vector by 

w„+i = w„ - /^„VL'^*^(w„) 

= w„ + Ai„e„x„ - p„9/„(w„), 



(9) 



where (?/«(•) is the sub-gradient of the convex function /n(-), 
7„ is the regularization parameter and p„ = 7n/in- 

Eq. (j9]) is the proposed regularized LMS. Compared 
to its conventional counterpart, the regularization term, 
~/Oji(?/,i(w„), always promotes the coefficient vector to sat- 
isfy the constraint (j7]i. The parameter p„ is referred to as the 
regularization step size. Instead of tuning p„ in an ad hoc 
manner, we establish a systematic approach to choosing 

Theorem 1. Assume both {a;„} and {vn} are Gaussian 
independent and identically distributed (i.i.d.) processes that 
are mutually independent. For any n > 1 



E\\wn-w\\l < 



- WII2 



(10) 



i/wo — Wq and pn G [0, 2/9*], where w is the true coefficient 
vector and \v[^ and w„ are filter coefficients updated by 
Q and Q with the same step size respectively, p* is 
calculated by 



Pn 



max 



I2 



if Hn ore constant values (LMS), or 



(1 - an/N) 



/«(Wn) - rin 
P/n(Wn)||i ' 



(11) 



(12) 



if Pn is chosen using (|6]l (NLMS), where N is the filter length, 
(T^ is the variance o/{x„} and rj„ is an upper bound of fn{w) 
defined in Q. 

The proof of Theorem [T] is provided in the Appendix. 

Remark L Theorem [T] shows that with the same initial 
condition and step size Pn, the regularized LMS algorithm 
provably dominates conventional LMS when the input signal 
is white. The parameter p* 



11 



or (12 1 can be used as 



the value for p„ in (|9]) to guarantee that regularized LMS will 
have lower MSD than conventional LMS. The value p* only 
requires specification of the noise variance and rjn which upper 
bounds the true value fniw). Simulations in latter sections 
show that the performance of the regularized LMS is robust 
to misspecified values of 77„. 
Remark 2. Eq. 



Ill and (12 1 indicate that to ensure supe- 



riority the regularization is only "triggered" if /,i(w„) > rjn- 
When /„(w„) < rjn, Pn = and the regularized LMS reduces 
to the conventional LMS. 

Remark 3. The closed form expression for p* is derived 
based on the white input assumption. Simulation results in 



latter sections show that the (111 and ( 12 1 are also empirically 
good choices even for correlated input signals. Indeed, in 
the next section we will show that provable dominance can 
be guaranteed for correlated inputs when the regularization 
function is suitably selected. 
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(b) 

Fig. 1. Examples of (a) a general sparse system and (b) a group-sparse 
system. 



III. Sparse system identification 

A sparse system contains only a few large coefficients 
interspersed among many negligible ones. Such sparse systems 
are arise in many applications such as digital TV transmission 
channels |17| and acoustic echo channels |8|. Sparse systems 
can be further divided into general sparse systems and group- 
sparse systems, as shown in Fig. [T](a) and Fig.[T](b), respec- 
tively. Here we apply our regularized LMS to both general and 
group sparse system identification. We show that ZA-LMS and 
RZA-LMS in p2) are special examples of regularized LMS. 
We then propose group-sparse LMS algorithms for identifying 
group-sparse systems. 

A. Sparse LMS 

For a general sparse system, the locations of active non-zero 
coefficients are unknown but one may know an upper bound 
on their number. Specifically, we will assume that the impulse 
response w satisfies 

l|w||o<fc, (13) 

where || • ||o is the io norm denoting the number of non-zero 
entries of a vector, and fc is a known upper bound. As the £q 
norm is non-convex it is not suited to the proposed framework. 
Following |l9l and |10|, we instead adopt the £i norm as a 



surrogate approximation to the io norm: 

l|w||i = \w,\. 

i=Q 



(14) 



Using the regularization penalty /n(w) = ||w||i in regularized 
LMS Q, we obtain 

w„+i = w„ + ^„e„x„ - p„sgnw„, (15) 

where the component-wise sgn(-) function is defined as 



sgn(a;) 



x/\x\ X ^ 
x = 



(16) 



Equation ([T5]l yields the ZA-LMS inti'oduced in |[T2|. The 
regularization parameter p„ can be calculated by ( 1 1 1 for LMS 
and by (12 1 for NLMS, where /„(w„) = ||w„||i and rjn is an 
estimate of the true ||w||i. 



An alternative approach to approximating the £o norm is to 
consider the following function | [T2) , p5) , p9) : 

Af-l 



1 



|w||o 



(17) 



where (5 is a sufficiently small positive real number Inter- 



preting (17 1 as a weighted £i approximation, we propose the 
regularization function /„(w) 

/„(w)= -Iz^.l, (18) 



i=0 



and 



1 



(19) 



where Wn,i is the i-th coefficient of w„ defined in (|Tji. Using 
(18]) in (|5|l yields 



■Wn+l,t = Wn,-i + finenXn-i - PnPn,i SgnW„,i, (20) 

which is a component-wise update of the RZA-LMS proposed 



in |12|. Again, pn can be computed using (111 for LMS or 
(12 1 for NLMS, where rjn is an estimate of the true ||w||o. 



i.e., the number of the non-zero coefficients. 

B. Group-sparse LMS 

In many practical applications, a sparse system often ex- 
hibits a grouping structure, i.e., coefficients in the same group 
are highly correlated and take on the values zero or non-zero as 
a group, as shown in Fig.[T](b). The motivation for developing 
group-sparse LMS is to take advantage of such a structure. 

We begin by employing the mixed £12 norm for promoting 



group-sparsity, which was originally proposed in 1 18| and has 
been widely adopted for various structured sparse regression 
problems pO), pT). The €12 norm of a vector w is defined 



w 1.2 



J 

El 



W/, 



(21) 



where 

/ = {0,1,..., 

J 

u 



^ is a group partition of the whole index set 

N - 1}: 



h - 



Ij n Ij' = (p when j ^ i', 



(22) 



and w/^ is a sub-vector of w indexed by Ij. The £12 norm is 
a mixed norm: it encourages correlation among coefficients 
inside each group via the £2 norm and promotes sparsity 
across those groups using the £1 norm. ||w||i.2 is convex in 
w and reduces to ||w||i when each group contains only one 
coefficient, i.e.. 



\h\^\l2\ = ---=\Ij\ = l, 



(23) 



where | • | denotes the cardinality of a set. Employing fni'W') — 
j|vvr||i 2, the £1^2 regularized LMS, which we refer to as GZA- 
LMS, is 



J 



1,...,J, 

(24) 



4 



be filter coefficients updated by \27\ and Q with the same 
respectively. Then, 



E ||W„+1 - w||2 < E Wr,+ i ~ W 



(28) 



W/2 



Fig. 2. A toy example illustrating the £12 norm of a 16 X 1 coefficient 
vector w: l|w||i,2 = ZIj=i l|w/j||2. 



and (5 is a sufficiently small number ensuring a non-zero 
denominator. To the best of our knowledge this is the first time 
that the norm has been proposed for the LMS adaptive 
filters. 

To further promote group selection we consider the follow- 
ing weighted £12 regularization as a group-wise generalization 
of ([T8]i: 

J 

/n(w) = ^/3„,,||w,J|2, (25) 

where /3„j is a re-weighting parameter defined by 

1 



\2 — \\ "n+1 ■■ 112 

i/Wn ~ and p„ G [0, 2p* ], w is the true coefficient vector 
and p* is 



where /ri(w„) is determined by ( |25| ), ?]„ /i on upper bound 
o//„(w) one/ 



r„ = w^x„-x^5/„(w„)+77„-max 

j 



|x^9/„(w„)|. 

(30) 



The proof of Theorem |2] can be found in the Appendix. We 
make the following remarks. 

Remark 4. Theorem |2] is derived from the general form 



15i 



f3n 



and the corresponding regularized LMS update is then 



(26) 



(27 1 and can be directly specialized to (24i, (20i and 
Specifically, 

• GZA-LMS ( [24| can be obtained by assigning /?„ = 1; 
. RZA-LMS (|20|) can be obtained when = l,j = 

1,...,./; 

. ZA-LMS ([15]) can be obtained when both |/, | = 1, j = 
1, 



Wn,/, +/^ne„X/ -pnPn 



(27) 

which is referred to as GRZA-LMS. 

As both the ^12 norm and the weighted £12 norm are 
convex. Theorem [T] applies under the assumption of white 
input signals and p„ can be calculated by (111 or (12 1. The 



J and Pn.j — 1. 
Remark 5. Theorem |2] is valid for any WSS input signals. 
However, the dominance result in ( |28] l is weaker than that in 
Theorem [T| as it requires w„ = at each iteration. 

Remark 6. Eq. (j29|l can be applied to both LMS and NLMS, 
depending on if /i„ are deterministic functions of x„ as 
specified in (|6|. This is different from Theorem [T] where we 
have separate expressions for LMS and NLMS. 



parameter ?]„ can be chosen as an estimate of the true ||w||i_2 
for GZA-LMS ( [24| , or the number of non-zero groups of w 
for GRZA-LMS^27|. 

Finally, we note that GZA-LMS and GRZA-LMS reduce to 
ZA-LMS and RZA-LMS, respectively, if each group contains 
only one element. 

C. Choosing regularization parameter for correlated input 
Theorem [T] gives a closed form expression for p„ and ( 1 1 



12 1 is applicable for any convex /„(w). However, the 



dominance over conventional LMS is only guaranteed when 
the input signal is white. Here we develop an alternative 
formula to determine /?„ that applies to correlated input signals 



for sparse and group-sparse LMS, i.e., (15 1, ( 20 1, ( 24 1 and ( 27 1 



We begin by considering the weighted 2 regularization 



( [25] l and the corresponding GRZA-LMS update (27 1. Indeed, 
the other three algorithms, i.e., (24i, (20i and (15i, can 



be treated as special cases of (27 1. For general wide-sense 
stationary (WSS) input signals, the regularization parameter 
Pn of ( [27| ) can be selected according the following theorem. 

Theorem 2. Assume {x„} and {vn\ are WSS stochastic 
processes which are mutually independent. Let w„ and 



Remark 7. p* in (29i is non-zero only if /„(w„) is greater 
than rjn + p-ufn (rather than rjn as presented in Theorem [T}. 
This may yield a more conservative performance. 

IV. Numerical simulations 

In this section we demonstrate our proposed sparse LMS 
algorithms by numerical simulations. Multiple experiments are 
designed to evaluate their performances over a wide range of 
conditions. 

A. Identifying a general sparse system 

Here we perform evaluation of the proposed filters for 
general sparse system identification, as illustrated in Fig. [T] 
(a). There are 100 coefficients in the time varying system and 
only five of them are non-zero. The five non-zero coefficients 
are assigned to random locations and their values are also 
randomly drawn from a standard Gaussian distribution. The 
resultant true coefficient vector is plotted in Fig. [3] 

1) White input signals: Initially we simulate white Gaus- 
sian input signal {xn} with zero mean and unit variance. The 
measurement noise {f„} is an independent Gaussian random 
process of zero mean and variance = 0.1. For ease of 
parameter selection, we implement NLMS-type filters in our 
simulation. Three filters (NLMS, ZA-NLMS and RZA-NLMS) 
are implemented and their common step-size /i„ is set via (j6]) 
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Fig. 3. The general sparse system used for simulations. 
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Fig. 4. White input signals: performance comparison for different filters. 



with a„ = 1. The regularization parameter pn is computed 
using ( 12 1, where rjn is set to 77„ — ||w||i (i.e., the true value) 
5 for RZA-NLMS. For comparison 



for ZA-NLMS and r/„ 
we also implement a recently proposed sparse adaptive filter, 
referred to as APWLl pSl, which sequentially projects the 
coefficient vector onto weighted £i balls. We note that our 
simulation setting is identical to that used in 1 15] and thus we 
adopt the same tuning parameters for APWLl. In addition, the 
weights /3„_i for RZA-NLMS is scheduled in the same manner 
as that in fTFl for a fair comparison. The simulations are run 
100 times and the average estimates of mean square deviation 
(MSD) are shown in Fig. |4] 

It can be observed that ZA-NLMS improves upon NLMS 
in both convergence rate and steady-state behavior and RZA- 
NLMS does even better. The parameter q of APLWl is the 
number of samples used in each iteration. One can see that 
RZA-NLMS outperforms APLWl when q = 1, i.e., the 
case that APLWl operates with the same memory storage 
as RZA-NLMS. With lai-ger p APLWl begins to perform 
better and exceeds RZA-NLMS when q > 10. However, there 
is a trade-off between the system complexity and filtering 
performance, as APWLl requires 0{qN) for memory storage 
and 0{N logj N + qN) for computation, in contrast to LMS- 
type methods which require only 0{N) for both memory and 
computation. 

Next, we investigate the sensitivity to rjn for ZA-NLMS 
and RZA-NLMS. The result shown in Fig. |5] indicates that 
ZA-NLMS is more sensitive to ?7„ than RZA-NLMS, which 
is highly robust to misspecified ?7„. 

Further analysis reveals that the projection based methods 
such APWLl may exhibit unstable converging behaviors. Fig. 
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Fig. 5. Sensitivity of ZA-NLMS and RZA-NLMS to r;„ : MSD for ZA-NLMS 
and RZA-NLMS at the 750th iteration for white input signals. 



|6] shows two independent trials of the simulation implemented 
in Fig. |4] It can be seen that there exist several local minima 
in APWLl. For example. Fig. [6] (b) seems to indicate that 
APWLl (q = 10) converges at the 400th iteration with MSD 
~ —12 dB, yet its MSD actually reaches values as low as —25 
dB at the 900th iteration. This slow convergence phenomenon 
is due to the fact that the weighted £i ball is determined 
in an online fashion and the projection operator is sensitive 
to mis-specifications of the convex set. In the contrast, our 
regularized LMS uses sub-gradient rather than projection to 
pursue sparsity, translating into improved convergence. 

2) Correlated input signals: Next, we evaluate the filtering 
performance using correlated input signals. We generate the 
sequence {x„} as an AR(1) process 



0.8a;„_ 



(31) 



which is then normalized to unit variance, where is a 

Gaussian i.i.d. process. The measurement system is the same 
as before and the variance of the noise is also — 0.1. 

We compare our RZA-NLMS with APWLl (q = 10) and 
standard NLMS is also included as a benchmark. All the 
filter parameters are set to the same values as that in the 



previous simulation, except we employ both (12i and (29 1 
to calculate p„ in RZA-NLMS. The simulations are run 100 
times and the average MSD curves are plotted in Fig. [7] 
While Theorem[T]is derived based on white input assumptions, 
using (12 1 to determine p„ achieves an empirically better 
performance compared to using ( 29 1 - whose use guarantees 
dominance but yields a conservative result. This confirms our 
conjecture in Remark 7. We also observe a severe perfor- 
mance degradation of APWLl for correlated input signals. 
Fig. |8] draws two independent trials in this simulation. The 
phenomenon described in Fig. |6]becomes more frequent when 
the input signal is correlated, which drags down the average 
performance of APWLl significantly. Finally, we note that the 
filtering performance of a group sparse system (e.g., Fig.[T](b)) 
may be very different from that of a general sparse system. 



This will investigated in Section IV-B 

3) Tracking performance: Finally, we study the tracking 
performance of the proposed filters. The time-varying system 
is initialized using the same parameters as used to generate 
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Fig. 6. Two different trials of RZA-NLMS and APWLl for white input 
signals. APWLl exhibits unstable convergence. 
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Fig. 8. Two different trials of RZA-NLMS and APWLl for coirelated input 
signals. 
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Fig. 7. Con'elated input signals: performance comparison for different filters, 
where RZA-NLMS 1 and RZA-NLMS 2 use {12) and l(29) to determine p„, 
respectively. 



Fig. 9. Comparison of tracking performances when the input signal is white. 



Fig. [3] At the 750th iteration the system encounters a sudden 
change, where all the active coefficients are left-shifted for 10 
taps. We use white input signals to excite the unknown system 
and all the filter parameters are set in an identical manner to 
Section [IV-A1| The simulation is repeated 100 times and the 
averaged result is shown in Fig. [9] It can be observed that both 
RZA-NLMS and APWLl {q = 10) achieve better tracking 
performance than the conventional NLMS. 
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Fig. 10. The group-sparse system used for simulations. There are two active 
blocks; each of them contains 15 non-zero coefficients. 
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Fig. 11. MSD comparison for the group-sparse system for white input signals. 

B. Identifying a group-sparse system 

Here we test perfomiance of the group-sparse LMS filters 



developed in Section III-B The unknown system contains 200 
coefficients that are distributed into two groups. The locations 
of the two groups are randomly selected, which start from 
the 36th tap and the 107th tap, respectively. Both of the two 
groups contain 15 coefficients and their values are randomly 
drawn from a standard Gaussian distribution. Fig. 10 shows 
the response of the true system. 

The input signal {xn} is initially set to an i.i.d. Gaussian 
process and the variance of observation noise is cr^ = 0.1. 
Three filters, GRZA-NLMS, RZA-NLMS and NLMS, are 
implemented, where the performance of NLMS is treated as a 
benchmark. In GRZA-NLMS, we divide the 200 coefficients 
equally into 20 groups, where each of them contains 10 
coefficients. The step size /i„ of the three filters are all set 
according to (|6| with a„ = 1. We use (12i to calculate p„, 
where r/„ is set to 30 (the number of non-zero coefficients) 
for RZA-NLMS and 2 (the number of non-zero blocks) for 
GRZA-NLMS, respectively. We repeat the simulation 200 



times and the averaged MSD is shown in Fig. 1 1 It can be seen 
that GRZA-NLMS and RZA-NLMS outperform the standard 
NLMS for 10 dB in the steady-state MSD, while GRZA- 
NLMS only improves upon RZA-NLMS, but only marginally. 
This is partially due to the fact that in the white input scenario 
each coefficient is updated in an independent manner 

We next consider the case of correlated input signals, where 



{x„} is generated by (31 1 and then normalized to have unit 
variance. The parameters for all the filters are set to the same 
values as in the white input example and the averaged MSD 



• NLMS 
o RZA-NLMS 
GRZA-NLMS 




4000 6000 
Iterations 



10000 



Fig. 12. 
signals. 



MSD comparison for the group-sparse system for correlated input 




2000 3000 4000 

Iterations 



Fig. 13. Tracking performance comparison for the group-sparse system for 
white input signals. 



curves are plotted in Fig. 12 In the contrast to the white input 
example, here RZA-NLMS slightly outperforms NLMS but 
there is a significant improvement of GRZA-NLMS over RZA- 
NLMS. This demonstrates the power of promoting group- 
sparsity especially when the input signal is correlated. 

Finally, we evaluate the tracking performance of the adap- 
tive filters. We use white signals as the system input and 
initialize the time-varying system using that in Fig. 10 At 



the 2000th iteration, the system response is right-shifted for 
50 taps, while the values of coefficients inside each block 
are unaltered. We then keep the block locations and reset 
the values of non-zero coefficients randomly at the 4000th 
iteration. From Fig. [13] we observe that the tracking rate of 
RZA-NLMS and GRZA-NLSM are comparable to each other 
when the system changes across blocks, and GRZA-NLMS 
shows a better tracking performance than RZA-NLMS when 
the system response changes only inside its active groups. 

V. Conclusion 

In this paper we proposed a general class of LMS-type 
filters regularized by convex sparsifying penalties. We derived 
closed-form expressions for choosing the regularization pa- 
rameter that guarantees provable dominance over conventional 
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LMS filters. We applied the proposed regularized LMS filters 
to sparse and group-sparse system identification and demon- 
strated their performances using numerical simulations. 

Our regularized LMS filter is derived from the LMS 
framework and inherits its simplicity, low computational cost 
and low memory requirements, and robustness to parameter 
mismatch. It is likely that the convergence rate and steady-state 
performance can be improved by extension to second-order 
methods, such as RLS and Kalman filters. Efficient extensions 
of our results for sparse/group-sparse RLS filters are a worthy 
topic of future study. 

VI. Appendix 
A. Proof of Theorem [7] 

We prove Theorem [T] for LMS, i.e., the case that /x„ are 
constants. NLMS, where /i„ is determined by (j6]l, can be 
derived in a similar manner. 

According to 



w„+i - w 



= (I - ^„X„X^)(w„ - w) - /9„9/„(w„) + UnVny^i 



Noting that w„, x„ and u„ are mutually independent, we have 

-E {||w„+i - w||^|w,i} = 

(w„ - w)^_E I ~ A*nX„xJ~) I (w„ - w) + ^S;cr^iJ {||x„||^} 



I ^ a/„(w„) + /a„||a/„(w„)|| . 

(33) 

As {xn] is a Gaussian i.i.d. process, x„ is a Gaussian random 
vector with mean zero and covariance ct^I. Thus, 

^ {(I - ^in^n^lf] = (1 - 2al^l^ + iVa4/4)I, (34) 

(35) 



and 



£; {I - Ai„X„X^} = (1 - cr^Mn)I, 

E{\\^X]^Nal. 



(36) 



Substituting (34i, (B5b and (B6]l into (33i, we have 



£^ {||w„+i - w|p|w„} = 

(1 - 2al^ln + Nal^ll) ||w„ - wjp + N ^llalal 

+ 2p„(l - a2^„)(w - w„)^9/„(w„) + p^||a/„(w„)f. 

(37) 

As /„(•) is a convex function, by the definition of sub-gradient, 
we have 

(w - W„)^9/„(w„) < /„(w) - /„(w„) < ?7n - /ri(w„). 

(38) 

Therefore, 

.B{||w„+i - w|p|w„} < 

(1 - 2a2^„ + Natiil) ||w„ - + Niilalal (39) 

- 2p„(l - Cr^/in)(/«(w«) - na) + (W„) f . 

Define 

C(p„) = -2p„(l - a2/i„)(/„(w„) - r7„) + p^||9/„(w„)||2, 

(40) 



and take expectation on both sides of ( 39 1 with respect to w„ 
to obtain 

i;{||w„+i -w||2} 

< (1 - 2a2/i„ + iVa*^^)^ iii^^ - w||2} + N ^ilalal 
+ i?{C(Pn)}. 

(41) 

It is easy to check that C(p„) < if p„ e [0, 2p*], where p* 
is defined in ( [TT] i. Therefore, 

i^{||Wn+l-w||2} 

< (1 - 2a2/i„ + iVa4/i2)^; - w|p} + TV^i^a^fr^ 

(42) 

if Pn G [0, 2p* ]. For the standard LMS, there is 

s{iiw;+i-w|p} 

= (1 - 2a2/i„ + iVa4^2)^ iii^^ - w||2} + 7Vm''^''t2. 

(43) 

Therefore, under the condition that i? {||wo — w|p} = 
i? {||wq - w|p}, ([To]) can be obtained from (42i and ( |43| ) 
using a simple induction argument. 



(32) B. Proof of Theorem [2] 



We start our proof from (32i and calculate the following 
conditional MSD: 

{l|w„+i - w|p|w„,x„} = 

(W„ - W)^(I - ^„X„X^)^(W„ - W) + ^^(T^||X„|P + D{pn), 

(44) 

where 

D{pn) = 2p„(w-w„)^(I-/x„x„x^)a/„(w„)+p^J15/„(w„)|p. 

(45) 

For the cross term 2p„(vkr - w„)^(I - ^„x„x^)a/„(w„) we 
have 

2p„(w - w„)^(I - /^„x„x^)9/„(w„) 
= 2p„(w - w„)^a/„(w„) + 2p„^„w^x„ • x^5/„(w„) 

- 2p„/x„w^x„ • x^9/„(w„) 
< 2p„(?7„ - /„(w„)) + 2p X„ 9/„(w„) 

+ 2pnHn |w^X„| • |x^9/„(w„)| . 

(46) 

We now establish upper-bounds for jw-'^Xnl. Indeed, 



I T 

w x,„ 



J 



1 



X„./, 2 



.7 



< ^I3nj\\^i,\\2 > max- 
I ^ 

J. . X \\^7i,Ij\\2 ^ \\^n.I,\\2 

= in (w„ ) max — < ?7„ max — . 

3 3 



(47) 
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Substituting ( [46| and ( [47| into ( [45] l we obtain that 

D{pn) < -2p„(/„(w„)-77„-^„r„)+/9^||a/„(w„)||2, (48) 



where r„ is defined in (30 1. Note that D{pn) < if pn G 
[0, 2pl] {pl is defined in^|29])). There is 

£; {||w„+i - w|p|w„,x„} 



< (W„ - W) (I - /^„X„X„ ) (W„ - W) + ^„(7„ ||X„ 

if pn e [0,2p*]. Therefore, 

£;{||w„+i-w||2|} 

< {(w„ - w)'^(I - /x„x„x^)^(w„ - w)} 

+ plalE {\\^r.f] 
= { « - w)^(I - M„x„x^)2« - w)} 

+ m2^2^{||x„|P} 
= i?{|K+i-w|p|}, 

which proves Theorem |2j 



(49) 



(50) 
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